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1 Introduction 



In this report, building on the deterministic multi- valued one-to-many Byzantine agreement 
(broadcast) algorithm in our recent technical report [2], we introduce a deterministic multi- 
valued all-to-all Byzantine agreement algorithm (consensus), with linear complexity per bit 
agreed upon. The discussion in this note is not self-contained, and relies heavily on the material 
in [2] please refer to [2] for the necessary background. 

Consider a synchronous fully connected network with n nodes, namely 0, 1, . . . , n — 1. At 
most t < n/3 nodes can be faulty. Every node i is given an initial value of L bits. The goal of 
a consensus algorithm is to allow each node to decide (or agree) on a value consisting of L bits, 
while satisfying the following three requirements: 

• Every fault-free node eventually decides on a value (termination); 

• The decided values of all fault-free nodes are equal (consistency); 

• If every fault-free node holds the same initial value v , the decided value equals to v (va- 
lidity). 

Our algorithm achieves consensus on a long value of L bits deterministically. Similar to 
the one-to-many algorithm in [2], the proposed all-to-all Byzantine agreement (or consensus) 
algorithm progresses in generations. In each generation, D bits are being agreed upon, with 
the total number of generations being L/D. For convenience, we assume L to be an integral 
multiple of D. 

2 Consensus Algorithm 

In the proposed consensus algorithm, we use the same technique of "diagnosis graph" to narrow 
down the locations of faulty nodes as in [2]. If a node y is accused by at least t + 1 other 
nodes, y must be faulty. Then it is isolated, and does not perform the algorithm below. When 
a new node is isolated, essentially n decreases by 1, and t also decreases by 1. For the reduced 
network, the condition that n > 3t will continue to hold, if it held previously. In the following, 
we consider the reduced network with the reduced values of n, t, and assume that no node in 
the reduced network is accused by > t nodes. When we say "network", we mean the reduced 
network below. 

The following steps are performed for the D bits of information of the current generation. 
Let the D bits at node i be denoted as v 

Step 1 This step is performed by each node i: We use a (n,n — 2t) distance- (2t+ 1) code, wherein 
each codeword consists of n symbols, each symbol of size D/(n — 2t) bits. Such a code 
exists, provided that the symbol size is large enough. Let us denote this code as C%t- 
With a symbol size of D /{n — 2t) bits, the D-bit value at node i can be viewed as {n — 2t) 
symbols. Encode i>, into a codeword Sj from the code C 24 . The j-th symbol in the codeword 
is denoted as Sy. Send s# to all other nodes that it trusts. Thus, node i sends i-th symbol 
of its codeword to all nodes that it trusts. 
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Note for future reference: Since code G^t has distance 2t + 1, any punctured (n — 
z,n — 2t) code obtained from G^t has distance 2t + 1 — z, where z < 2t. Let C t denote the 
punctured (n — t,n — 2t) code of distance t + 1 obtained by removing the last t symbols 
of the original (n, n — 2t) code above. By "last" t symbols, we refer to symbols with index 
n — t — 1 through n — 1 . 

Step 2 This step is performed by each node i: Denote by the symbol received from node j 
in step 1. If i trusts j and = Sij, then set M w - = TRUE; else My = FALSE. Mi is 
a "match" vector, and records whether z's information matches with the symbols sent by 
the other nodes. 

Step 3 Each node i uses traditional Byzantine agreement (one-to-many) algorithm to broadcast 
Mi to all the nodes. 

Step 4 Now each node i has received Mj from each node j. Due to the use of BA in the previous 
step, all fault-free nodes receive identical M vectors. Each node i attempts to find a set 
X containing exactly (n — t) nodes that are "collectively consistent" . That is, for every 
pair of nodes j, k e X, Mj& = M^j = TRUE. 

There are two cases: 

• No such subset X exists: Note that if all fault-free nodes (at least n — t of them 
exist) have identical initial value, then a set X must exist. (Fault-free nodes always 
trust each other.) Thus, if no such X exists, that implies that the fault-free nodes 
do not have identical value. Thus, the fault-free nodes can agree on a default value, 
and terminate the algorithm. 

• At least one such subset X exists: In this case, all fault-free nodes identify one 
such set X using a deterministic algorithm (thus, all nodes should identify the same 
X). Since all fault-free nodes can compute X identically, without loss of generality, 
suppose that X contains nodes through (n — t — 1). Thus, the nodes not in X are 
n — t through n — 1. (In other words, the nodes are renumbered after X is computed.) 
Thus, 

X = {0,l,---,n-t - 1} 

and define 

X — {n — t, • • • , n — 1} 

Let the (n — i)-symbol received vector at node i consisting of the symbols received 
from the (n — t) nodes in X be called R^ 

Since X contains n — t nodes and there are at most t faulty nodes, at least n — 2t>2 
of these nodes must be fault-free. Consider two fault-free nodes j and k in X. By 
definition of Rj and Rk, nodes j and k find these vectors "consistent" with their own 
values Vj and Vk, respectively. In other words, Rj and Rk are codewords in C t . 
There are at least n — 2t fault-free nodes in X, which must have sent the same symbols 
to nodes j and k in step 1. Thus, the (n — t)-symbol vectors Rj and Rk must be 
identical in at least (n — 2t) positions, and differ in at most t positions. 

vector Ri, the symbols are arranged in increasing order of the identifiers of the nodes that sent them. 
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Given that (i) C t is a distance t + 1 code, (ii) i2j and are both codewords in C t , 
and (iii) Rj and -R^ differ in at most t positions, it follows that Rj and R^ must be 
identical. This, in turn, implies that Vj and V/~ must be identical as well. This proves 
the following claim: 

Claim 1: All fault-free nodes in X have identical D-bit values. In other words, for 
all fault-free nodes j, k G X, Vj = v k . 

Step 5 Now consider a node y G X. Identify any node z y in X such that z y and y trust each 
other. If no such z y exists, that implies that y is accused by all n — t > t nodes in X, and 
therefore, y must be already identified as faulty, and must have been isolated previously. 
Thus, z y exists. 

For each y G X, node z y transmits t symbols s Zy ( n _ t ) through s^( n _i) to node ?/J§ Each 
fault-free node y G X forms a vector using the (n — t) symbols rjo ■ ■ ■ 7j-( n _t_i) received in 
step 1, and the above t symbols received from node z y . Suppose that the n-symbol vector 
thus formed at fault-free node y is denoted F y . 

Failure detection rule: If F y is not a valid codeword from the (n,n — 2t) code C^u 
then node y detects a failure. This failure observation is distributed to other nodes in the 
next step. (Justification for this failure detection mechanism is presented below.) 

Step 6 All nodes in X broadcast (using a traditional BA algorithm) a single bit notification 
announcing whether they detected a failure in the above step. 

Decision rule: If no failure detection is announced by anyone, then each fault-free node 
% in X decides (agrees) on its own value Vi, and each fault-free node j G X decides on the 
value corresponding to the codeword Fj. 

If a failure is detected by anyone, then the failure is narrowed down using a "full broadcast" 
procedure described in [2], and agreement on the D bits is also achieved as a part of this 
full broadcast. The diagnosis graph is updated, and we return to step 1 for next set of D 
bits. 

Justification for the failure detection and decision rules: Consider any fault-free node 
i G X and any fault-free node y G X. Now let us compare F y with Sj. 

• Consider the first n — t symbols of these vectors (elements with index through n — t — 1). 
Observe that, for fault-free node i G X, = Sij, for < j < n — t — 1, by definition of 
set X. Since at least n — 2t of the symbols with index < n — t come from fault-free nodes, 
F y can differ from (and Sj) in at most t positions with index < n — t. 

• Consider the last t symbols of vectors F y and Sj. Since z y may be faulty and could have 
sent arbitrary t symbols to to y in step vectors F y and Sj may differ in all of these t 
positions. 

2 For complexity analysis presented later, note that there are t nodes in X, each of which is sent t symbols 
each consisting of ^^ s - 
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Thus, rj and Fj may differ in at most 2t positions. Now let us make two observations: 

• Observation 1: By definition of Sj, Si is a valid codeword from the distance-(2£ + 1) code 
C*2t. Since F y differs from valid codeword Sj only in 2t places, it follows that: either (i) 
Si = F y (and both are valid codewords), or (ii) F y is not a valid codeword. 

• Observation 2: To derive this observation, consider the case where all the nodes in A 
are fault-free. Clearly, in this case, all these nodes must have same value (from claim 1 
above). Then s, is identical for all i E X, and thus the (fault-free) nodes in X send symbols 
consistent with the common value in step 1 to the nodes in X. F y (y e X) consists entirely 
of symbols sent to it by nodes in X. Thus, clearly, F y will be equal to Si for alH G X when 
all nodes in X are fault-free. It then follows that F y is a codeword from the (n, n — 2t) 
code C*2t when all nodes in X are fault-free. 

The above two observations imply that: (i) if F y is not a codeword then all the nodes in X 
cannot be fault-free (that is, at least one of these nodes must have behaved incorrectly), and 
(ii) if F y is a codeword, then the value corresponding to F y matches with the values at all the 
fault-free nodes in X. 

Thus, when the failure detection rule above detects a failure, a failure must have indeed 
occurred. Also, while Claim 1 shows that the fault-free nodes in X will agree with each other 
using the above decision rule, observation 1 implies that fault-free nodes in X will also agree 
with them. 



3 Complexity Analysis 

We have finished describing the proposed consensus algorithm above. Now let us study the 
communication complexity of this algorithm. 

• In step 1, every node sends at most n — 1 symbols of D/(n — 2t) bits. So at most 

< n -V D bits (1) 
n-2t V ' 

are transmitted. Notice that this value decreases when both n and t are reduces by the 
same amount. As a result, no more than D bits will be transmitted in step 1 when 
some nodes are isolated. 

• In step 3, every node broadcasts a "match" vector of n — 1 bits, using a traditional 
Byzantine agreement (one-to-many) algorithm. Let us denote B as the bit-complexity to 
broadcast 1 bit. So in step 3, at most 

n(n - 1)B bits (2) 

are transmitted. 

• If no A is found in step 4, the algorithm terminates and nothing is transmitted any more. 
So we only consider the case when A exists. As we have seen before, in step[5j every node 
in A receives t symbols of D/(n — 2t) bits, which results in -z^D bits being transmitted. 



4 



Additionally, in step 6, every node in X broadcasts a 1-bit notification, which requires tB 
bits being transmitted. So if no failure is detected, at most 

t 2 

D + tB bits (3) 



n-2t 



are transmitted in steps 5 and 6. Again, this value decreases when both n and t are 
reduced by the same amount. So if some nodes are isolated, fewer bits will be transmitted. 

• If a failure is detected in step 6, every node broadcasts all symbols it has sent and has 
received through steps 1 to [51 In step 1, n(n — 1) symbols are transmitted. In step [51 t 2 
symbols are transmitted. So 2(n(n — 1) + t 2 ) symbols are being broadcast after a failure 
is detected, which results in 

2(n(n - 1) +t 2 ) 

— — -DB bits 4 

n-2t w 

being transmitted. Again, this value decreases when both n and t are reduced by the same 
amount. So if some nodes are isolated, fewer bits will be transmitted. 

Now we can compute an upper bound of the complexity of the proposed algorithm. Notice 
that D bits are being agreed on in every generation, so there are L/D generations in totaH. 
Thus, excluding the broadcast after failures are detected, no more than 

( n{ ~ n - l) D + n(n - l)B + — — — D + tB\ - (5) 
\ n-2t v ; n - 2t ) D K J 

= „ { „-l )+ e („(„-!) +t)BL 

n-2t D K J 

are transmitted. In addition, similar to our one-to-many algorithm, all faulty nodes will be 
identified after failures are detected in at most (t + l)t generations. So the "full broadcast" will 
be performed at most (t + l)t times throughout the whole execution of the algorithm. So the 
total number of bits transmitted in the "full broadcast" in all generations is at most 

2(„(„-l) +f »)( t+ l) f 
n-2t 

An upper bound on the communication complexity of the proposed algorithm, denoted as C{L) 
is then computed as 

w ( w _l) +f 2 [n{n-l)+t)BL 2(n(n - 1) + t 2 )(t + l)t 
1 ' ~ n-2t D n-2t U 

For a large enough value of L, with a suitable choice of 

(n(n-l)+t)(n-2t)L 
' " \ 2(n(n-l)+t 2 )(t + l)t' 1 ' 



3 To simplify the presentation, we assume that L is an integer multiple of D here. For other values of L, the 
analysis and results are still valid by applying the ceiling function [•] to the number of generations. 
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we have 



n( „ _ d + g 2W; - 1) + t ) (n(n - 1) +t 2 )(i+ l)t 

v ; ~ 7i -2t V n-2t v ; 



n(n — 1) + t 
n-2t 
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L + SL o,5 0(n 2 ' 5 ) (11) 



Notice that deterministic broadcast algorithms of complexity 0(n 2 ) are known [T], so we 
assume B = B(n 2 ). Then the complexity of our algorithm for allt < n/3 is upper bounded by 

< n(n-l)+t 2 L + L o.5 e(n 4. 5) < }^ nL + L o.5 0(n 4. 5) _ (12) 
n — 2t 3 

For a given network with size n, the per-bit communication complexity of our algorithm is 
upper bounded by 

a(L) = ^ (13) 

< n(n ~ 1)+t2 +L^ a5 9(n 4 - 5 ) (14) 
n — 2t 

n(n 1 ) | t 2 . . . . 

->■ V 2 = ^' GS ^ °° ' ^ ^ 
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